Video tracking system and method

ABSTRACT

A video tracking system and method which includes a video camera having a selectively adjustable panning orientation, tilting orientation and focal length. A processor receives video images acquired by the camera. The processor is programmed to detect target objects in the images and selectively adjust the camera to track the target object. The camera is adjusted at variable rates which are selected as a function of a property, such as the velocity, of the target object. The focal length of the camera is selectively adjusted as a function of the distance of the target object from the camera. The images acquired by the camera are geometrically transformed to align images having different fields of view to facilitate the analysis of the images and thereby allowing the camera to be continuously adjustable for the production of video images having relatively smooth transitional movements.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to a video camera system fortracking a moving object.

[0003] 2. Description of the Related Art

[0004] There are numerous known video surveillance systems which may beused to track a moving object such as a person or vehicle. Some suchsystems utilize a fixed camera having a stationary field of view (FOV).To fully cover a given surveillance site with a fixed camera system,however, it will oftentimes be necessary to use a significant number offixed cameras.

[0005] Movable cameras which may pan, tilt and/or zoom may also be usedto track objects. The use of a PTZ (pan, tilt, zoom) camera system willtypically reduce the number of cameras required for a given surveillancesite and also thereby reduce the number and cost of the video feeds andsystem integration hardware such as multiplexers and switchersassociated therewith.

[0006] Visual surveillance systems will also often rely upon humanoperators. The use of human operators, however, is subject to severallimiting factors such as relatively high hourly costs, susceptibility tofatigue when performing tedious and boring tasks, inability toconcentrate on multiple images simultaneously and accidental/intentionalhuman error. To reduce the impact of such human limitations, automatedvideo tracking systems have been used to assist or replace humanoperators.

[0007] Three primary steps typically employed in automated videotracking systems involve background subtraction, target detection andtarget tracking. The use of fixed cameras greatly simplifies and speedsthe background subtraction and target detection processes. When a PTZsystem is employed, the camera is typically repositioned by analyzingthe motion of the target object and predicting a future location of thetarget object. The camera is then adjusted to reposition the estimatedfuture location of the target object in the center of the FOV. Thecamera may then remain stationary as the target object moves The camerawill then be repositioned to once again recenter the target object. Suchdiscrete camera movements are continually repeated to track the targetobject. Conventionally, each discrete camera movement occurs at thefastest camera movement speeds available wherein each of the panningmovements will be conducted at a common pan rate, each of the tiltingmovements will be conducted at a common tilt rate and each of thezooming movements, i.e., adjusting the focal length of the camera, willbe conducted at a common zoom rate. The resulting series of discretecamera movements typically leads to a video image which is “jumpy” incomparison to a video image produced by the manual tracking of a targetobject by a skilled human operating a joystick or other camera control.

SUMMARY OF THE INVENTION

[0008] The present invention provides an automated video tracking systemhaving a movable camera wherein the automatic adjustment of the camerawhen tracking a target object may be done continuously and at variousspeeds to provide a video image with relatively smooth transitionalmovements during the tracking of the target object.

[0009] The invention comprises, in one form thereof, a video trackingsystem which includes a video camera having a field of view wherein thecamera is selectively adjustable and adjustment of the camera varies thefield of view of the camera. Also included is at least one processorwhich is operably coupled to the camera. The processor receives videoimages acquired by the camera and selectively adjusts the camera. Theprocessor is programmed to detect a moving target object in the videoimages and adjust the camera to track the target object wherein theprocessor adjusts the camera at a plurality of varied adjustment rates.

[0010] The invention comprises, in another form thereof, a videotracking system including a video camera having a field of view whereinthe camera is selectively adjustable and adjustment of the camera variesthe field of view of the camera. Also included in the system is at leastone processor which is operably coupled to the camera. The processorreceives video images acquired by the camera and selectively adjusts thecamera. The processor is programmed to detect a moving target object inthe video images and estimate a target value wherein the target value isa function of a property of the target object. The property may be thevelocity of the target object. The processor adjusts the camera at aselected adjustment rate which is a function of the target value.

[0011] In alternative embodiments, such systems may include a processorwhich selects the adjustment rate of the camera as a function of atleast one property of the target object. The at least one property ofthe target object may include the velocity of the target object. Thecamera may be selectively adjustable at a variable rate in adjusting atleast one of a panning orientation of the camera and a tilt orientationof the camera.

[0012] The processor may also be programmed to select the adjustmentrate of the camera based upon analysis of a first image and a secondimage wherein the first image is acquired by the camera adjusted todefine a first field of view and the second image is acquired by thecamera adjusted to define a second field of view. The first and secondfields of view may be partially overlapping and the determination of theselected adjustment rate by the processor may include identifying andaligning at least one common feature represented in each of the firstand second images. The camera may also define a third field of view asthe camera is being adjusted at the selected adjustment rate with athird image being acquired by the camera when it defines the third fieldof view and wherein the first, second and third images are consecutivelyanalyzed by the processor. The camera may have a selectively adjustablefocal length and the processor may select the focal length of the cameraas a function of the distance of the target object from the camera.

[0013] The adjustment of the camera may include selective panningmovement of the camera wherein the panning movement defines an x-axis,selective tilting movement of the camera wherein the tilting movementdefines a y-axis, and selective focal length adjustment of the camerawherein adjustment of the focal length defines a z-axis with the x, yand z axes being oriented mutually perpendicular. The processor mayadjust the camera at a selected panning rate which is a function of thevelocity of said target object along the x-axis and at a selectedtilting rate which is a function of the velocity of the target objectalong the y-axis. The camera may also be adjusted at a first selectedadjustment rate until the processor selects a second adjustment rate andcommunicates the second adjustment rate to the camera.

[0014] The tracking system may also include a display device and aninput device operably coupled to said system wherein an operator mayview the video images on the display device and input commands or datainto the system through the input device. The display device and inputdevice may be positioned remotely from said camera.

[0015] The invention comprises, in yet another form thereof, a videotracking system including a video camera having a field of view whereinthe camera is selectively adjustable and adjustment of the camera variesthe field of view of the camera. The system also includes at least oneprocessor operably coupled to the camera. The processor receives videoimages acquired by the camera and selectively adjusts the camera. Theprocessor is programmed to detect a moving target object in the videoimages and adjust the camera and track the target object. Duringtracking of the target object, the processor communicates a plurality ofcommands to the camera and the camera is continuously and variablyadjustable in accordance with the commands without interveningstationary intervals.

[0016] The camera of such a system may be selectively adjustable at avariable rate in adjusting at least one, or each, of a panningorientation of the camera and a tilt orientation of the camera. Thecamera may acquire images for analysis by the processor while beingadjusted and the continuous and variable adjustment of the cameraincludes varying either a direction of adjustment or a rate ofadjustment. The commands may involve a first command which adjusts thecamera at a selected rate and direction until a second command isreceived by the camera.

[0017] The invention comprises, in still another form thereof, a videotracking system including a video camera having a field of view whereinthe camera is selectively adjustable and adjustment of the camera variesthe field of view of the camera. The system also includes at least oneprocessor operably coupled to the camera wherein the processor receivesvideo images acquired by the camera and selectively adjusts the camera.The processor is programmed to detect a moving target object in thevideo images and adjust the camera and track the target object. Theprocessor can consecutively analyze first, second and third imagesacquired by the camera wherein each of the images records a differentfield of view. The processor communicates to the camera a first commandselectively adjusting the camera and a second command selectivelyadjusting the camera. The camera is adjusted in accordance with thefirst command during at least a portion of a first time interval betweenacquisition of the first and second images. The camera is adjusted inaccordance with the second command during at least a portion of a secondtime interval between acquisition of the second and third images. Thecamera is continuously adjusted between acquisition of the first imageand the third image.

[0018] The invention comprises, in another form thereof, a method oftracking a target object with a video camera. The method includesproviding a video camera which has a field of view and is selectivelyadjustable wherein adjustment of the camera varies the field of view ofthe camera. The method also includes adjusting the camera at aselectively variable adjustment rate to track a target object. Theadjustment rate may be selected as a function of at least one propertyof the target object.

[0019] The invention comprises, in yet another form thereof a method oftracking a target object with a video camera. The method includesproviding a video camera which has a field of view and is selectivelyadjustable wherein adjustment of the camera varies the field of view ofthe camera. The method also includes detecting a target object in imagesacquired by said camera, estimating a target value which is a functionof at least one property of the target object and adjusting the cameraat a selectively variable rate wherein the adjustment rate of the camerarate is selected as a function of the target value.

[0020] In alternative embodiments of the above-described methods, the atleast one property of the target object may include the velocity of thetarget object. The adjustment rate may be selected based upon analysisof a first image and a second image wherein the first image is acquiredby the camera when adjusted to define a first field of view and thesecond image is acquired by the camera when adjusted to define a secondfield of view. The first and second fields of view may be partiallyoverlapping and the determination of the adjustment rate may includeidentifying and aligning at least one common feature represented in eachof the first and second images. The adjusting of the camera at aselectively variable adjustment rate may include adjusting at least one,or each, of a panning orientation of the camera and a tilt orientationof the camera and the selected variable adjustment rates may be selectedas a function of the velocity of the target object. The determination ofthe adjustment rates may also involve the use of a proportionalityfactor which is a function of the real world distance of the targetobject from the camera. The adjustment of the camera may also includeadjusting the camera at a first selected adjustment rate until a secondselected adjustment rate is communicated to the camera.

[0021] The invention comprises, in another form thereof, a method oftracking a target object with a video camera. The method includesproviding a video camera which has a field of view and is selectivelyadjustable wherein adjustment of the camera varies the field of view ofthe camera. The method also includes adjusting the camera to track atarget object wherein the adjustment of the camera includes selectivelyand variably adjusting at least one adjustment parameter and wherein thecamera is continuously adjustable during the selective and variableadjustment of the at least one adjustment parameter.

[0022] The selective and variable adjustment of at least one adjustmentparameter of the camera may include the adjustment of at least one, oreach, of a panning orientation of said camera and a tilt orientation ofsaid camera. The adjustment of such parameters may be selective andvariable. The selective and variable adjustment of such parameters mayinclude the varying of either the direction of adjustment or the rate ofadjustment and the rate of adjustment may be selected as a function ofthe velocity of the target object.

[0023] The invention comprises, in another form thereof, a method oftracking a target object with a video camera. The method includesproviding a video camera which has a field of view and is selectivelyadjustable wherein adjustment of the camera varies the field of view ofthe camera. The method also includes detecting a target object in imagesacquired by the camera and acquiring first, second and third imageswherein each of the first, second and third images record a differentfield of view. The method also includes communicating a first command tothe camera selectively adjusting the camera and communicating a secondcommand to the camera selectively adjusting the camera. Further includedis the step of continuously adjusting the camera between acquisition ofthe first image and acquisition of the third image wherein the camera isadjusted in accordance with the first command during at least a portionof a first time interval between acquisition of the first image andacquisition of the second image and the camera is adjusted in accordancewith the second command during at least a portion of a second timeinterval between acquisition of the second image and acquisition of thethird image.

[0024] The first and second commands may selectively adjust at leastone, or each, of a panning orientation of the camera and a tiltorientation of the camera. The adjustment of such parameters may be at aselectively variable adjustment rate and the rates may be selected as afunction of the velocity of the target object.

[0025] The invention comprises, in yet another form thereof, a videotracking system having a video camera with a selectively adjustablefocal length. Also included is at least one processor operably coupledto said camera wherein the processor receives video images acquired bythe camera and selectively adjusts the focal length of the camera. Theprocessor is programmed to detect a moving target object in the videoimages and adjust the focal length of the camera as a function of thedistance of the target object from the camera. The camera of the systemmay also have a selectively adjustable panning orientation and aselectively adjustable tilting orientation wherein the processor adjuststhe panning orientation and the tilting orientation to maintain thetarget object centered in the video images and selectively adjusts thefocal length of the camera as a function of the tilt angle.

[0026] The invention comprises, in still another form thereof, a methodof automatically tracking a target object with a video camera. Themethod includes providing a video camera having a selectively adjustablefocal length and adjusting the focal length of the camera as a functionof the distance of the target object from the camera. The camera usedwith such a method may also have a selectively adjustable panningorientation and a selectively adjustable tilting orientation whereintracking the object involves adjusting the panning and tiltingorientation of the camera and selectively adjusting the focal length ofthe camera as a function of the tilt angle of camera.

[0027] An advantage of the present invention is that it provides videoimages which reflect relatively fluid transitional camera movementsduring the tracking of the target object and which do not “jump” frompoint to point when tracking the target object. The resulting video istypically regarded as more pleasant to view and less distracting tohuman operators who are viewing the video to observe the behavior of thetarget object.

[0028] Another advantage of the present invention is that it allows forimages acquired for automatic tracking purposes to be obtained while thecamera is in motion and thus does not require the camera to rest in astationary position for image acquisition during the tracking of atarget object.

[0029] Yet another advantage of the present invention is that it allowsthe system to continue tracking a target object while a human operatormanually repositions the camera because the tracking system may utilizea series of images which do not have a common field of view to track thetarget object.

[0030] Still another advantage of the present invention is that it maybe used with conventional pan, tilt, zoom (PTZ) cameras and, thus,facilitates the retrofitting and upgrading of existing installationshaving such conventional PTZ cameras.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] The above mentioned and other features and objects of thisinvention, and the manner of attaining them, will become more apparentand the invention itself will be better understood by reference to thefollowing description of an embodiment of the invention taken inconjunction with the accompanying drawings, wherein:

[0032]FIG. 1 is a schematic view of a video surveillance system inaccordance with the present invention.

[0033]FIG. 2 is a schematic view of the automated tracking unit.

[0034]FIG. 3 is a flowchart representing the operation of the videosurveillance system.

[0035]FIG. 4 is a flow chart representing the different status levels ofthe tracking unit.

[0036]FIG. 5 is a flow chart representing the reacquisition subroutinewhich is used when the target object is lost.

[0037] Corresponding reference characters indicate corresponding partsthroughout the several views. Although the exemplification set outherein illustrates an embodiment of the invention, in one form, theembodiment disclosed below is not intended to be exhaustive or to beconstrued as limiting the scope of the invention to the precise formdisclosed.

DESCRIPTION OF THE PRESENT INVENTION

[0038] In accordance with the present invention, a video surveillancesystem 20 is shown in FIG. 1. System 20 includes a camera 22 which islocated within a partially spherical enclosure 24. Enclosure 24 istinted to allow the camera to acquire images of the environment outsideof enclosure 24 and simultaneously prevent individuals in theenvironment being observed by camera 22 from determining the orientationof camera 22. Camera 22 includes a controller and motors which providefor the panning, tilting and adjustment of the focal length of camera22. Panning movement of camera 22 is represented by arrow 26, tiltingmovement of camera 22 is represented by arrow 28 and the changing of thefocal length of the lens 23 of camera 22, i.e., zooming, is representedby arrow 30. As shown with reference to coordinate system 21, panningmotion may track movement along the x axis, titling motion may trackmovement along the y-axis and focal length adjustment may be used totrack movement along the z-axis. In the illustrated embodiment, camera22 and enclosure 24 are a Phillips AutoDome® Camera Systems brand camerasystem, such as the G3 Basic AutoDome® camera and enclosure, which areavailable from Bosch Security Systems, Inc. formerly PhillipsCommunication, Security & Imaging, Inc. having a place of business inLancaster, Pa. A camera suited for use with present invention isdescribed by Sergeant et al. in U.S. Pat. No. 5,627,616 entitledSurveillance Camera System which is hereby incorporated herein byreference.

[0039] System 20 also includes a head end unit 32. Head end unit 32 mayinclude a video switcher or a video multiplexer (not shown). Forexample, the head end unit may include an Allegiant brand video switcheravailable from Bosch Security Systems, Inc. formerly PhillipsCommunication, Security & Imaging, Inc. of Lancaster, Pa. such as a LTC8500 Series Allegiant Video Switcher which provides inputs for up to 64cameras and may also be provided with eight independent keyboards and 8monitors. Head end unit 32 includes a keyboard 34 and joystick 36 foroperator input and a display device 38 for viewing by the operator. A 24volt a/c power source is provided to power both camera 22 and anautomated tracking unit 50.

[0040] Illustrated system 20 is a single camera application, however,the present invention may be used within a larger surveillance systemhaving additional cameras which may be either stationary or moveablecameras or some combination thereof to provide coverage of a larger ormore complex surveillance area. One or more VCRs may also be connectedto head end unit 32 to provide for the recording of the video imagescaptured by camera 22 and other cameras in the system.

[0041] The hardware architecture of tracking unit 50 is schematicallyrepresented in FIG. 2. A power line 42 connects power source 40 toconverter 52 to power tracking unit 50. Tracking unit 50 receives avideo feed from camera 22 via video line 44 and video line 45 is used tocommunicate video images to head end unit 32. In the illustratedembodiment, video lines 44, 45 are coaxial, 75 ohm, 1 Vp-p and includeBNC connectors for engagement with tracking unit 50. The video imagesprovided by camera 22 are analog and may conform to either NTSC or PALstandards. When tracking unit 50 is inactive, i.e., turned off, videoimages from camera 22 pass through tracking unit 50 to head end unit 32as shown by analog video line 54. A MOFSET based circuit provides avideo input buffer 56 and video decoder 58 performs video decoding andpasses the digitized video images to processor 60. In the illustratedembodiment, video input is no greater than 1 Vp-p and if the videosignal exceeds 1 Vp-p it will be clipped to 1 Vp-p. Video processing isperformed by processor 60 running software which is described in greaterdetail below. Processor 60 may be a TriMedia TM-1300 programmable mediaprocessor available from Phillips Electronics North America Corporation.At start up, processor 60 loads a bootloader program from serial EEPROM62. The boot program then copies the application code from flash memory64 to SDRAM 66 for execution. In the illustrated embodiment, flashmemory 64 provide 1 megabyte of memory and SDRAM 66 provides 8 megabytesof memory. Since the application code from flash memory 64 is loaded onSDRAM 66 upon start up, SDRAM is left with approximately 7 megabytes ofmemory for video frame storage.

[0042] As shown in FIG. 2, a video data bus and 12C bus connectsprocessor 60 with video decoder 58, a 12C bus connects processor 60 withEEPROM 62, a XIO bus connects processor 60 with flash memory 64, a SDRAMbus connects processor 60 with SDRAM 66 and a XIO bus connects processor60 with UART 68. UART 68 is used for serial communications and generalpurpose input/output. UART 68 has a 16 character FIFO buffer, a 6 bitinput port and an 8-bit output port that is used to drive status LED 70,error LED 72 and output relay 74 through the use of small signaltransistors. Relay line 49 communicates the status of double pole,single throw relay 74 to head end unit 32. A RS-232 level convertor 76provides communication between UART 68 and RS-232 serial line 48. Thecharacteristics of RS-232 line 48 and the communications conveyedthereby in the illustrated embodiment are a 3 wire connection, 19200baud, 8 data bits, no parity, 1 stop bit and no handshaking.

[0043] In the illustrated embodiment, the only commands conveyed totracking unit 50 which are input by a human operator are on/offcommands. Such on/off commands and other serial communications betweenhead unit 34 and tracking unit 50 are conveyed by bi-phase line 46 fromhead unit 34 to camera 22 and to tracking unit 50 from camera 22 viaRS-232 line 48. In the illustrated embodiment, tracking unit 50 isprovided with a sheet metal housing and mounted proximate camera 22.Alternative hardware architecture may also be employed with trackingunit 50. Such hardware should be capable of running the softwaredescribed below and processing at least approximately 5 frames persecond for best results.

[0044] Tracking unit 50 performs several functions, it controls videodecoder 58 and captures video frames acquired by camera 22; it registersvideo frames taken at different times to remove the effects of cameramotion; it performs a video content analysis to detect target objectswhich are in motion within the FOV of camera 22; it calculates therelative direction, speed and size of the detected target objects; itsends direction and speed commands to camera 22; it performs all serialcommunications associated with the above functions; and it controls theoperation of the status indicators 70, 72 and relay 74.

[0045] The operation of system 20 will now be described in greaterdetail. When tracking unit 50 is first activated the first step involvesinitializing camera 22 and positioning camera 22 to watching for aperson or moving object to enter the FOV of camera 22 by taking repeatedimages as 24-bit YUV color images as either NTSC or PAL CIF resolutionimages. Alternatively, camera 22 may be moved through a predefined“tour” of the surveillance area after initialization and watch for aperson or other moving object to enter the FOV of camera 22 as camera 22searches the surveillance area. For reference purposes, two images orframes acquired by camera 22 for analysis will be labeled:

I₁, I₂

[0046] In the exemplary embodiment, camera 22 is continually acquiringnew images and the computational analysis performed by processor 60 tocompare the current image with a reference image takes longer than thetime interval between the individual images acquired by camera 22. Whenprocessor 60 completes its analysis, it will grab a new image foranalysis. The time interval between two images which are consecutivelygrabbed by processor 60 is assumed to be constant by illustratedtracking unit 50. Although the time interval between two consecutivelygrabbed images may differ slightly, the variations are consideredsufficiently small and the processing efficiencies achieved by thisassumption to be sufficiently great to justify this assumption. As usedherein unless otherwise indicated, the term consecutive images refers toimages which are consecutively grabbed by processor 60 for analysis asopposed to images which are consecutively acquired by camera 22. A QCIFresolution sub-sample (i.e., an image having a quarter of the resolutionof the NTSC or PAL CIF resolution image) of the current I₁ and I₂ imagesis created. The sub-sample groups adjacent pixels together to define anaverage value for the grouped pixels. The purpose of the sub-samplingprocess is to reduce the time consumed by motion detection. A secondsub-sample of the first sub-sample (resulting in images having {fraction(1/16)} the resolution of the original CIF resolution images) may alsobe taken to further increase the speed of the motion detection process.Such sub-sampling, however, reduces the resolution of the images and canpotentially degrade the ability of system to detect the features andtargets which are the subjects of interest. For reference purposes thesesub-sampled images are labeled:

I₁ ¹, I₁ ², I₂ ¹, I₂ ²

[0047] If only a single sub-sample of each image is taken, thesesub-samples are labeled:

I₁ ¹, I₂ ¹

[0048] Alternatively, these subsamples may be labeled ¹I₁ and ¹I₂.

[0049] Target Object Detection

[0050] Initially, the camera may be stationary and monitoring a specificlocation for a moving target object. System 20 looks for a moving targetobject by computing the image difference between the two most currentimages every time a new frame is grabbed by processor 60. The imagedifference is calculated by taking the absolute value of the differencebetween associated pixels of each image. When images I₁ and I₂ arealigned, either because camera 22 took each image with the same FOV orbecause one of the images was mapped to the second image, the imagedifference, Δ, is calculated in accordance with the following equation:

Δ=|I ₂ −I ₁|

[0051] A histogram of these differences is then calculated. If there isa moving target in the two images, the histogram will usually have twopeaks associated with it. The largest peak will typically be centeredaround zero and corresponds to the static regions of the image. Thesecond major peak represents the pixels where changes in image intensityare high and corresponds to the moving areas within the image, i.e., amoving target object. The pixels associated with the second peak can beconsidered as outliers to the original Gaussian distribution. Since theywill typically constitute less than 50% of the total number of pixels inthe illustrated embodiment, they are detected using the estimationtechnique Least Median of Squares.

[0052] An alternative method that may be used with the present inventionand which provides for the manual identification of a target object fortracking purposes is discussed by Trajkovic et al. in U.S. Pat. App.Pub. 2002/0140813 A1 entitled Method For Selecting A Target In AnAutomated Video Tracking System which is hereby incorporated herein byreference. A method for detecting motion of target objects that may beused with the present invention is discussed by Trajkovic in U.S. Pat.App. Pub. 2002/0168091 A1 entitled Motion Detection Via Image Alignmentwhich is hereby incorporated herein by reference.

[0053] Identification of Point of Interest

[0054] After detecting motion, a point of interest (POI) correspondingto the centroid of the moving target object is then identified. Bycalculating the convolution with Sobel operators of arbitrary order, theSobel edge detection masks look for edges in both the horizontal andvertical directions and then combines this information into a singlemetric as is known in the art. More specifically, at each pixel both theSobel X and Sobel Y operator is used to generate a gradient value forthat pixel. They are labeled gx and gy respectively. The edge magnitudeis then calculated by equation (1):

EdgeMagnitude={square root}{square root over (gx ² gy ²)}  (1)

[0055] The edge of the moving target object will have large edgemagnitude values and these values are used to define the edges of thetarget object. The centroid of the target object or area of motion isfound by using the median and sigma values of the areas of detectedmotion. The centroid, which is the point of interest or POI, is thenfound in both frames and its image position coordinates stored as (x(0),y(0), and x(1), y(1)).

[0056] Three related coordinate systems may be used to describe theposition of the POI, its real world coordinates (X, Y, Z) correspondingto coordinate system 21 shown in FIG. 1, its image projectioncoordinates (x, y) and its camera coordinates (α, β, k) which correspondto the camera pan angle, camera tilt angle and the linear distance tothe POI. The two positions of the POI captured by the two images allowfor the determination of the 3-D position of the POI in both frames aswell as the relative velocity of the POI during the time intervalbetween the two frames. A simplified representation of the moving personor target object in the form of the 2-D location in the image is used inthis determination process.

[0057] Tracking unit 50 does not require the two images which are usedto determine the motion of the POI to be taken with the camera havingthe same pan, tilt and focal length settings for each image. Instead,tracking unit 50 maps or aligns one of the images with the other imageand then determines the relative velocity and direction of movement ofthe POI. Two alternative methods of determining the velocity anddirection of the POI motion are described below. The first methoddescribed below involves the use of a rotation matrix R while the secondmethod uses a homography matrix determined by matching and aligningcommon stationary features which are found in each of the two imagesbeing analyzed.

[0058] Rotation Matrix Method

[0059] When camera 22 is pointing in a direction determined by pan andtilt angles α and β respectively, the rotation matrix, R, determined bythese angles is given by: $\begin{matrix}{R = {{\begin{bmatrix}1 & 0 & 0 \\0 & {\cos \quad \alpha} & {{- \sin}\quad \alpha} \\0 & {\sin \quad \alpha} & {\cos \quad \alpha}\end{bmatrix}\begin{bmatrix}{\cos \quad \beta} & 0 & {\sin \quad \beta} \\0 & 1 & {0\quad} \\{{- \sin}\quad \beta} & 0 & {\cos \quad \beta}\end{bmatrix}} = {\begin{bmatrix}{\cos \quad \beta} & 0 & {\sin \quad \beta} \\{\sin \quad \alpha \quad \sin \quad \beta} & {\cos \quad \alpha} & {{{- \sin}\quad \alpha \quad \cos \quad \beta}\quad} \\{{- \cos}\quad {\alpha sin}\quad \beta} & {\sin \quad \alpha} & {\cos \quad {\alpha cos}\quad \beta}\end{bmatrix} = \begin{bmatrix}r_{1}^{T} \\r_{2}^{T} \\r_{3}^{T}\end{bmatrix}}}} & (2)\end{matrix}$

[0060] For an arbitrary point having image projection coordinates (x,y), the relation between the world coordinates, P_(w), of an arbitrarypoint P and its camera coordinates, P_(c), is given as:

P _(w) =RP _(c)

[0061] and the relation between the world coordinates and the imageprojection coordinates (x, y) is given by:$x = {{f\frac{r_{1}^{T}P_{w}}{r_{3}^{T}P_{w}}} + x_{0}}$$y = {{f\frac{r_{2}^{T}P_{w}}{r_{3}^{T}P_{w}}} + y_{o}}$

[0062] wherein f is the focal length of the camera, (x, y) are thecurrent image projection coordinates of the POl, and (x₀, y₀) are theprevious image projection coordinates of the POI. Using the aboveequations: $\begin{matrix}{{x(0)} = {\left. {\frac{r_{1}^{T}P_{w}}{r_{3}^{T}P_{w}} + x_{0}}\Rightarrow{r_{3}^{T}{P(0)}\left( {{x(0)} - x_{0}} \right)} \right. = {{fr}_{1}^{T}{P(0)}}}} & \left( {3a} \right) \\{{y(0)} = {\left. {\frac{r_{2}^{T}P_{w}}{r_{3}^{T}P_{w}} + y_{0}}\Rightarrow{r_{3}^{T}{P(0)}\left( {{y(0)} - y_{0}} \right)} \right. = {{fr}_{2}^{T}{P(0)}}}} & \left( {3b} \right)\end{matrix}$

[0063] Assuming the target object to be a person of average height, theheight can be considered a constant (i.e., Z(0)=Z=Constant) andequations (3a) and (3b) will represent a linear system with two unknowns(X(0), Y(0)) which is easily solved. The position of the POI in thesecond image, (X(1), Y(1)), can be computed in a similar manner, and thereal world velocity of the target object in the x and y directions, X′and Y′ respectively, can be found by:

X′=X(1)−X(0)  (3c)

Y′=Y(1)−Y(0)  (3d)

[0064] Although the values for X′ and Y′ obtained in accordance withequations (3c) and (3d) are literally distances, the time intervalbetween consecutive images grabbed by processor 60 will be substantiallyconstant as discussed above and, thus, the distance traveled by thetarget object during all such constant time intervals is directlyproportional to the velocity of the target object and may be used as aproxy for the average velocity of the target object during the timeinterval between the acquisition of the two images. The sign of thevelocity values is indicative of the direction of motion of the POI. Inalternative embodiments, the actual velocity may be calculated and/orimages acquired at more varied time intervals may be used. With thisknowledge of the velocity and direction of motion of the POI, the panand tilt velocity of camera 22 can be controlled to keep the targetobject centered within the FOV of camera 22.

[0065] In one embodiment, camera control also includes adjusting thefocal length based upon the calculated distance between camera 22 andthe centroid of the target object, i.e., the POI. The destination focallength is assumed to be proportional to the distance between the POI andthe camera, this distance, i.e., D(k), is found by the followingequation:

D(k)=∥P _(w)(k)∥={square root}{square root over (X(k)² +Y(k)² +Z ²)}

[0066] wherein:

[0067] P_(w)(k) represents the three dimensional location of the pointin the world coordinate system;

[0068] X(k) is the distance of the POI from the focal point of thecamera in the X direction in the real world;

[0069] Y(k) is the distance of the POI from the focal point of thecamera in the Y direction in the real world; and

[0070] Z is the current focal length of the camera, i.e., the distancebetween the camera and the focal plane defined by the current zoomsetting.

[0071] It is desired to keep this distance expressed as focal lengthunits by use of the following:

D(k)=cf(k)

[0072] wherein:

[0073] f(k) is the focal length of the camera at time step k; and

[0074] c is a constant.

[0075] The focal length at each time step is computed using${f(k)} = \frac{D(k)}{c}$

[0076] With the current image projection of the POI given by (x_(c),y_(c)), then it holds $\begin{matrix}{{x_{c} - {f\frac{X_{c}}{Z_{c}}} + x_{0}},{y_{c} = {\left. {{f\frac{Y_{c}}{Z_{c}}} + y_{c}}\Rightarrow\frac{X_{c}}{Z_{c}} \right. = {\frac{x_{c} - x_{0}}{f} = x_{cn}}}},{\frac{Y_{c}}{Z_{c}} = {\frac{y_{c} - y_{0}}{f} = y_{cn}}}} & (4)\end{matrix}$

[0077] wherein:

[0078] X_(c), Y_(c) and Z_(c) are the current real world coordinates ofthe POI; and

[0079] x_(cn) and y_(cn) are the horizontal and vertical distancesrespectively of between the center of the image and the current imagecoordinates of the POI.

[0080] To achieve the desired or destination position of camera 22, itmay also be necessary to rotate the camera about its pan and tilt axes.The rotation matrix given by equation 2 may be used to compute thedesired position as follows:$x_{d} = {{f\frac{r_{1}^{T}P_{c}}{r_{3}^{T}P_{c}}} + x_{0}}$$y_{d} = {{f\frac{r_{2}^{T}P_{c}}{r_{3}^{T}P_{c}}} + y_{0}}$

[0081] wherein x_(d) and y_(d) are the destination image coordinates ofthe POI.

[0082] or equivalently: $\begin{matrix}{{x_{dn} = {\frac{x_{d} - x_{0}}{f} = \frac{r_{1}^{T}P_{c}}{r_{3}^{T}P_{c}}}}{y_{dn} = {\frac{y_{d} - y_{0}}{f} = \frac{r_{2}^{T}P_{c}}{r_{3}^{T}P_{c}}}}} & (5)\end{matrix}$

[0083] wherein x_(dn) and Y_(dn) are the respective horizontal andvertical distances separating the two points (x₀, y₀) from (x_(d),y_(d)).

[0084] Combining equation (4) with equation (5) provides:${P_{c} = {\left\lbrack {X_{c}Y_{c}Z_{c}} \right\rbrack^{T} = {{\left\lbrack {\frac{X_{c}}{Z_{c}}\frac{Y_{c}}{Z_{c}}1} \right\rbrack^{T}Z_{c}} = {{\left\lbrack {x_{cn}y_{cn}1} \right\rbrack^{T}Z_{c}} = {Z_{c}P_{cn}^{T}}}}}},$

[0085] After expansion, this equation may be written as:

x _(cn) cos β+sin β=x _(dn)(−x _(cn) cos α sin β+y_(cn) sin α+cos αcosβ)

x _(cn) sin αsin β+y_(cn) cos α−sin αcos β=y _(dn)(−x _(cn) cos α sinβ+y_(cn) sin α+cos αcos β)

[0086] wherein x_(cn) and y_(cn) are the camera coordinate equivalentsof x_(dn) and y_(dn). The angles of rotation can then be found byiteratively solving this equation. The angles determined by this processrepresent the movement of the target object between the two consecutiveimages, I₁ and I₂, previously analyzed As discussed above, the timeinterval between two such consecutive images is a substantially constantvalue and thus the angles determined by this process are target valueswhich are a function of the velocity of the target object in the timeinterval between the acquisition of the two images. The determinedangles are also a function of the original location of the target objectrelative to the camera, the acceleration of the object and the previousorientation of the camera. Homography Matrix Method

[0087] An alternative method of determining a target value which may beused in the control of camera 22 to track the target object and which isrepresentative of a property of the target object involves detectingcorners in images I₁ and I₂. Corners are image points that have anintensity which significantly differs from neighboring points. Variousmethods of identifying and matching such corners from two images areknown in the art.

[0088] One such known corner detection method is the MIC (minimumintensity change) corner detection method. The MIC corner detectionmethod uses a corner response function (CRF) that gives a numericalvalue for the corner strength at a given pixel location. The CRF iscomputed over the image and corners are detected as points where the CRFachieves a local maximum. The CRF is computed using the followingequation:

R=min (r _(A) ,r _(B))

[0089] wherein:

[0090] R is the CRF value;

[0091] r_(A) is the horizontal intensity variation; and

[0092] r_(B) is the vertical intensity variation.

[0093] The MIC method uses a three step process wherein the first stepinvolves computing the CRF for each pixel in a low resolution image.Pixels having a CRF above a first threshold T₁ are identified aspotential corners. This initial step will efficiently rule out asignificant area of the image as non-corners because the low resolutionof the image limits the number of pixels which require the computationof the CRF. The second step involves computing the CRF for the potentialcorner pixels using the full resolution image. If the resulting CRF isbelow a second threshold, T₂, the pixel is not a corner. For pixelswhich have a CRF which satisfies the second threshold, T₂, anotherinterpixel approximation for determining an intensity variation for thepixel may also be computed and compared to a threshold value, e.g., T₂.If the response is below the threshold value, the pixel is not a corner.The third step involves locating pixels having locally maximal CRFvalues and labeling them corners. Nearby pixels having relatively highCFR values but which are not the local maximal value will not be labeledcorners. Lists, PCL1 and PCL2, of the detected corners for images I₁ andI₂ respectively are then compiled and compared. The corners in the twoimages are compared/matched using a similarity measure such as anormalised cross-correlation (NCC) coefficient as is known in the art.

[0094] When camera 22 is adjusted between the acquisition of the twoimages I₁ and I₂, it is necessary, to detect the target object in themost recently acquired image, to align the images so that the backgroundremains constant and that only objects displaying motion relative to thebackground are detected. The adjustment of camera 22 may take the formof panning movement, tilting movement or adjustment of the focal lengthof camera 22. Geometric transforms may be used to modify the position ofeach pixel within the image. Another way to think of this is as themoving of all pixels from one location to a new location based upon thecamera motion. One such method for transforming a first image to alignit with a second image wherein the camera was adjusted between theacquisition of the two images is discussed by Trajkovic in U.S. Pat.App. Pub. No. 2002/0167537 A1 entitled Motion-Based Tracking WithPan-Tilt-Zoom Camera which is hereby incorporated herein by reference.

[0095] Alignment of consecutive images requires translation, scaling androtation of one image to align it with the previous image(s). Of thesethree operations translation is the simplest. Warping, a process inwhich each pixel is subjected to a general user-specifiedtransformation, may be necessary to reduce, expand, or modify an imageto a standard size before further processing can be performed. Imagesproduced by such geometric operations are approximations of theoriginal. The mapping between the two images, the current I₁ and areference I₂ images is defined by:

p′=sQRQ ⁻ p=Mp  (6)

[0096] where p and p′ denote the homographic image coordinates of thesame world point in the first and second images, s denotes the scaleimage (which corresponds to the focal length of the camera), Q is theinternal camera calibration matrix, and R is the rotation matrix betweenthe two camera locations.

[0097] Alternatively, the relationship between the image projectioncoordinates p and p′, i.e., pixel locations (x,y) and (x′, y′), of astationary world point in two consecutive images may be written as:$\begin{matrix}{x^{\prime} = \frac{{m_{11}x} + {m_{12}y} + m_{13}}{{m_{13}x} + {m_{32}y} + m_{33}}} & \left( {7a} \right) \\{y^{\prime} = \frac{{m_{21}x} + {m_{22}y} + m_{23}}{{m_{31}x} + {m_{32}y} + m_{33}}} & \left( {7b} \right)\end{matrix}$

[0098] Where └m_(ij)┘_(3×3) is the homography matrix M that maps(aligns) the first image to the second image.

[0099] The main problem of image alignment, therefore, is to determinethe matrix M. From equation (6), it is clear that given s, Q and R it istheoretically straightforward to determine matrix M. In practice,however, the exact values of s, Q, and R are generally not known.Equation (6) assumes that the camera center and the center of rotationare identical, which is typically only approximately true. Additionally,in order to retrieve precise values of camera settings, i.e., pan andtilt values for determining R and zoom values for determining s, thecamera must stop which will create unnatural motion and, depending onthe system retrieving the camera settings, may take a considerablelength of time.

[0100] The exemplary embodiment of the present invention computes thealignment matrix M directly from the images using equations (7a) and(7b) to avoid the necessity of acquiring information on the cameraposition and calibration. The point matches between the two images isperformed by first taking a QCIF sub-sample of the two images I₁ and I₂to obtain:

I₁ ¹, I₂ ¹

[0101] It is also possible to take a further QCIF sub-sample of thesub-sampled images to provide the following set of lower resolutionimages:

I₁ ¹, I₁ ², I₂ ¹, I₂ ²

[0102] The corners are then found in the low resolution images using theMIC corner detector described above. The homography matrix is thencomputed based upon a plurality of corresponding coordinates (x,y) and(x′, y′) in the low resolution image. Corner matching is then performedon the higher resolution image by finding the best corners aroundpositions predicted by the homography matrix calculated using the lowresolution images. A robust method such as the RANSAC algorithm which isknown in the art may be used with the higher resolution images toidentify “outlier” corner points which likely correspond to movingobjects within the image. The “outlier” corner points identified by theRANSAC algorithm are not used in the calculation of the homographymatrix using the higher resolution images to avoid the bias which wouldbe introduced by using moving points in the calculation of thehomography matrix. After removing the “outlier” corners using the RANSACalgorithm, the higher resolution images are used to the calculate thehomography matrix M.

[0103] The translation, rotation, and scaling of one image to align itwith the second image can then be performed. A translation is a pixelmotion in the x or y direction by some number of pixels. Positivetranslations are in the direction of increasing row or column index:negative ones are the opposite. A translation in the positive directionadds rows or columns to the top or left to the image until the requiredincrease has been achieved. Image rotation is performed relative to anorigin, defined to be at the center of the motion and specified as anangle. Scaling an image means making it bigger or smaller by a specifiedfactor. The following approximation of equations (7a) and (7b) are usedto represent such translation, rotation and scaling:

x′=s(x cos α−y sin α)+t _(x)

y′=s(y sin α+x cos α)+t _(y)  (8)

[0104] wherein

[0105] s is the scaling (zooming) factor.

[0106] α is the angle of rotation about the origin;

[0107] t_(x) is the translation in the x direction; and

[0108] t_(y) is the translation in the y direction.

[0109] By introducing new independent variables a₁=s cos α and a₂s sinα, equation (8) becomes:

x′=a ₁ x−a ₂ y+t _(x)

y′=a ₂ x+a ₁ y+t _(y)

[0110] After determining a₁, a₂, tx and ty, the two images, I₁ and I₂,can be aligned and the determination of the velocity and direction ofthe target object motion can be completed.

[0111] To create smooth camera motion camera 22 is controlled in amanner which allows camera 22 to be constantly in motion. If the POI isto the left of the center of the field of view processor 60 communicatesa command to camera 22 which instructs camera 22 to pan left at aparticular panning velocity or rate of adjustment. The panning velocityis determined by the distance the POI is from the center of the image.There is a linear relationship between the selected panning velocity andthe distance between the center of the most recently acquired image andthe POI in the horizontal or x direction. Similarly, the tilting rateand direction of camera 22 is determined by the vertical distance, i.e.,in the y direction, between the POI and the center of the most recentlyacquired image. Proportionality factors are also applied to account fordistance of the target object from the camera.

[0112] The distance of the target object from the camera also influencesthe desired panning velocity. For a target object moving at a givenspeed in the x direction, the panning angle will have to be adjusted ata slower rate to track the object the more distant the object is fromthe camera. The distance of the target object from the camera alsoimpacts the desired value of the camera tilt and focal length. Assuminga common height for all target objects and that the target object aremoving on a planar surface which is parallel to the panning plane, thetilt angle which places the target object in the center of the imagewill be determined by the distance of that object from the camera,similarly, to maintain the target object at a given image height andassuming all target objects are the same height, the desired focallength of the camera will be determined by the distance of the targetobject from the camera.

[0113] In the exemplary embodiment, the panning and tilting velocity ofcamera 22 are determined by the following equations:

X _(vel)(x _(delta) x _(high))*sin(tilt angle)

Y _(vel)=(y _(delta) /y _(high))*sin(tilt angle)

[0114] wherein:

[0115] X_(vel) is the velocity or rate at which the panning angle isadjusted;

[0116] Y_(vel) is the velocity or rate at which the tilting angle isadjusted;

[0117] x_(delta) is the distance between the POI and the center of theimage in the x direction;

[0118] y_(delta) is the distance between the POI and the center of theimage in the y direction;

[0119] x_(high) and y_(high) are normalization factors; and sin(tiltangle) is the sine of the camera tilt angle (measured with reference toa horizontal plane) and provides a proportionality factor which is usedto account for the target object distance from the camera. The resultingvalues X_(vel) and Y_(vel) are computed using the distance of the POIfrom the center of the image and the distance of the target object fromthe camera and, as described above, the distance of the POI from thecenter of the image is related to the movement of the target object overa constant time value, thus values X_(vel) and Y_(vel) are a function ofseveral properties of the target object, its position relative to thecamera in the real world and the position of the target object centroidwithin the FOV which is a function of the velocity and acceleration ofthe target object and thus, values X_(vel) and Y_(vel) are alsofunctions of the velocity and acceleration of the target object.

[0120] A proportionality factor which is a function of the distance ofthe target object from the camera is used to adjust the selected panningand tilting adjustment rates because this distance impacts the effectsof the panning and tilting adjustment of the camera. With regard to thepanning motion of the camera, for example, when the target object isdistant from the camera only minimal panning movement will be requiredto track movement of the target object in the x direction and maintainthe target in the center of the image. If the target object is closer tothe camera, the camera will be required to pan more quickly to track thetarget object if it were to move at the same speed in the x direction.Similarly, a higher rate of tilting is required to track targets whichare closer to the camera than those which are more distant when suchtargets are moving at the same speed.

[0121] Additionally, the focal length adjustment rate and direction,i.e., how quickly to zoom camera 22 and whether to zoom in or out, isdetermined using the distance of the target object from the camera. Theprocess described above for aligning two images having different scales,i.e., acquired at different focal lengths, allows for system 20 toutilize dynamic zooming, i.e., adjusting the focal length of camera 22during the tracking of the target object instead of requiring the camerato maintain a constant zoom or focal length value during tracking or foracquiring compared images. In the exemplary embodiment, the largestdetected moving object is selected as the target object provided thatthe size of the target object is larger than a predetermined thresholdvalue, e.g., 10% of the field of view. Once tracking of the targetobject begins, the focal length of camera 22 is adjusted in a mannerwhich attempts to maintain the target object between 10%-70% of the FOV.Tracking of the target may stop if the size of the object falls outsideof this range. The focal length of camera 22 is adjusted to account forthe distance of the target object from the camera with the goal ofkeeping the target object size relatively constant, e.g., 20% of theFOV, and which facilitates the observation of the target object.

[0122] More specifically, the desired focal length is determined byfirst estimating the target distance between the target object and thecamera as follows:

Target Distance=Camera Height/Sin(tilt angle)

[0123] wherein the tilt angle is determined with reference to ahorizontal plane. Camera 22 is mounted at a known height and this heightis input into tracking unit 50 during installation of system 20. Next,the resolution-limited FOV width (R−L FOV width) is calculated:

R−L FOV width=Number of effective pixels/Number of lines of resolutionrequired to identify an intruder

[0124] wherein:

[0125] Number of effective pixels is 768(H) for NTSC video images and752(H) for PAL video images; and

[0126] Number of lines of resolution to identify an intruder is in linesof resolution per foot, in the exemplary embodiment, e.g., 16 lines perfoot.

[0127] Then a desired focal length is calculated which will provide asufficient number of lines of resolution to continue tracking of thetarget object is calculated:

Desired Focal Length=Format*Target Distance (ft)/R−L FOV width

[0128] wherein:

[0129] Format is the horizontal width in mm of the CCD (charge-coupleddevice) used by the camera, e.g., 3.6 mm for camera 22. In theillustrated embodiment, camera 22 is instructed to adjust its focallength setting by changing the focal length to the desired focal lengthvalue. The focal length adjustment of camera 22 is thus a point-to-pointadjustment of the focal length. It would be possible in an alternativeembodiment, however, for camera 22 to be commanded to move at a selectedadjustment rate which is selected based upon the difference between thecurrent focal length and the desired focal length similar to the mannerin which the pan and tilt adjustments are made rather than to simplymove to a given zoom setting. Camera 22 would then continue to theadjust the focal length at the specified rate (and in the chosendirection, i.e., increasing or decreasing the focal length of thecamera) until processor 60 communicated a second command altering therate or direction of focal length adjustment. Such a second commandcould be to change the rate of change to 0 which would correspond to aconstant focal length value.

[0130] In summary, the video content analysis algorithm performs thefollowing functions:

[0131] Tracker Initialization: The tracker is initialized to positionthe camera and wait for a moving target object to enter the camera FOV.

[0132] Background Subtraction: Images are compared to subtract thebackground and detect moving target objects.

[0133] Corner Detection and Matching: Corner features in the backgroundare identified and matched to estimate changes in camera positionbetween acquisition of the images.

[0134] Warping: Images are geometrically distorted to align images takenwith differing fields of view and detect the moving target object insuch images.

[0135] Region Location and Extraction: Locating the target object ineach new frame involves locating and extracting the image regioncorresponding to the target object.

[0136] Point of Interest (POI) Computation: A simplified representationof the target object and its centroid is located within the twodimensional framework of the image.

[0137] Calculate adjustment rates for PTZ camera: Determine pan, tiltand focal length adjustment rates for camera and communicate commands tothe camera.

[0138]FIG. 3 provides a flow chart which graphically illustrates thegeneral logic of the video content analysis algorithm used by system 20as described above and which uses the homography matrix approach insteadof the rotation matrix approach to identify and track the target object.As shown in FIG. 3, after turning tracking unit 50 on, it is initializedat step 80 by loading a bootloader program from EEPROM 62 and copyingthe application code from flash memory 64 to SDRAM 66 for execution.Block 82 represents the remaining memory of SDRAM 66 which is availableas a ring buffer for storage of video image frames for processing byprocessor 60. At decision block 84 processor 60 determines if the firstflag is true. The first flag is true only when no images from camera 22have been loaded to SDRAM 66 for analysis by processor 60. Thus, whentracking unit 50 is turned on, the first time decision block 84 isencountered, the first flag will be true and processor 60 will proceedto block 86. Block 86 represents the grabbing of two images by processor60. Processor 60 then proceeds to block 88 where the current tilt valueof camera 22 for each of the two images are obtained from the integralcontroller of camera 22 for later use to calculate the destination focallength.

[0139] Next, block 90 represents the taking of subsamples of the twomost recently grabbed images. At block 92, the image difference of thetwo subsampled images is calculated to determine if any moving objectsare present in the images. (If a moving object is found then theintruder tracking functionality of unit 50 is engaged, i.e., ITETriggering.) If a moving object is present in the images, the centroidof the moving target object is located at block 94. A corner detectionmethod is then used to detect corner features in the subsampled imagesand generate lists of such corners at block 96. Next, at block 98, thedata for images I₁ and I₂ are swapped. The swapping of image data isdone so that when a new image is grabbed and placed in the buffer aftercompleting the calculations called for in steps 100-104 the new imageand data associated therewith will overwrite the image and dataassociated with the older of the two images already present in thebuffer. At block 100 the POI is calculated using the highest resolutionimages if the POI was determined using subsample images at block 94. Thedestination or desired focal length is then calculated at block 102. Thepan and tilt velocity, X_(vel) and Y_(vel) are calculated at block 104.Next, at block 106, processor 60 communicates a command to camera 22 toadjust the focal length to the desired focal length; to pan at anadjustment rate and direction corresponding to the magnitude and sign ofX_(vel); and to tilt at an adjustment rate and direction correspondingto the magnitude and sign of Y_(vel)

[0140] The process then returns to block 84 where the first flag will nolonger be true and the process will proceed to block 108 where a singlenew image will be grabbed and overwrite image I₂ in the buffer. The tiltvalue of camera 22 for new image I₂ is then obtained at block 110 fromthe integral controller of camera 22 for later calculation of thedesired focal length. The new image is then subsampled at block 112 andcorners are detected and a list of such corners created for thesubsampled images at block 114. The warping and alignment processdescribed above is then performed at block 116 to align images I₁ andI₂. At block 118, the image difference of the two aligned images is thencalculated to determine if a moving object is included in the images. Ifa moving target object is present in the images, the centroid of thetarget object is determined at block 120. At block 122 images I₁ and I₂and the data associated therewith are swapped as described above withrespect to block 98. At block 124 the size of the detected targetobject, i.e., the Blob_Size, is compared to a threshold value and, ifthe target object is not large enough, or if no target object has beenfound in the images, the process returns to block 84. If the targetobject is larger than the threshold size, the process continues on toblock 100 through 106 where the adjustment parameters of camera 22 aredetermined and then communicated to camera 22 as described above.

[0141] In the illustrated embodiment, camera 22 may pan and tilt atdifferent specified velocities, i.e., at selectively variable adjustmentrates, and when processor 60 communicates a command to camera 22,processor 60 instructs camera 22 to pan in a selected direction and at aselected rate, to tilt in a selected direction and at a selected rate,and to change the focal length to a desired focal length. Afterreceiving this first command, camera 22 will adjust by moving to thespecified focal length and panning and tilting in the specifieddirections and at the specified rates until camera 22 receives a secondcommand instructing it to pan in a new selected direction and at a newselected rate, to tilt in a new selected direction and at a new selectedrate, and to change the focal length to a new desired focal length. Thepanning and tilting of camera 22 may also cease prior to receiving thesecond command if camera 22 has a limited panning or tilting range andreaches the limit of its panning or tilting range. By instructing camera22 to pan and tilt in selected directions and at selected rates insteadof instructing camera 22 to move to new pan and tilt orientations andthen stop, camera 22 may be continuously adjusted during the tracking ofthe target object without stationary intervals separating the receiptand execution of the adjustment commands and thereby provide a stream ofvideo images with relatively smooth transitional movements.

[0142] Thus, during operation of system 20, processor 60 mayconsecutively analyze a series of images which may all record differentFOVs. As processor 60 analyzes images and repeatedly adjusts camera 22to track the target object, the series of images may include threeimages consecutively analyzed by processor 60, i.e., first, second andthird images, wherein each image records a different FOV. Processor 60will have communicated a previous command to camera 22 based uponearlier images and camera 22 will be adjusted in accordance with thisfirst command as it analyzes the first and second images, the analysisof the first and second images will result in a second command to camera22 and camera 22 will be adjusted in accordance with this second commandas it analyzes the second and third images to formulate the nextadjustment command for camera 22. As described above, camera 22 willcontinue to pan and tilt in accordance with the first command untilreceipt of the second command. In this manner, camera 22 may becontinuously adjusted as it acquires a series of images having differentfields of views without requiring stationary intervals for theacquisition of images having common FOVs or separating the execution ofadjustment commands.

[0143] The video content analysis algorithm described above assumes thatcamera 22 is mounted at a known height and works best when thesurveillance area and target objects conform to several characteristics.For best results, the target should be 30% to 70% of the image height,have a height to width ratio of no more than 5:1 and move less than 25%of the image width between processed frames at a constant velocity.System 20 tracks only one moving target at a time. If multiple targetsare within the FOV, system 20 will select the largest target if it is20% larger than next largest target. If the largest target is not atleast 20% larger than next largest target, system 20 may change targetsrandomly. Alternative target object identification methods may also beused to distinguish between moving objects, such as those analyzing thecolor histogram of the target object. It is best if the area of interestis within 1 standard deviation of the mean intensity of the surroundingenvironment. Best results are also obtained when the plane of the targetmotion is parallel to the panning plane. System 20 uses backgroundfeatures to detect “corners” and register subsequent images, thereforeit may fail in excessively featureless environments or if targets occupya majority of the FOV and obscure such corner features. Divergence fromthese assumptions and characteristics is not necessarily fatal to theoperation of system 20 and may merely degrade performance of system 20.These assumptions concerning the illustrated embodiment cover a largesubset of video surveillance applications related to restricted areaswhere people are not supposed to be present. It is also possible forthose having ordinary skill in the art to adapt illustrated system 20 tocover additional situations which are not necessarily limited to theseassumptions and characteristics.

[0144] As shown in FIG. 4, tracking unit 50 has three main states: 1)Tracker OFF, 2) Looking for Target and 3) Tracking Target. Tracking unit50 is turned on and off by a human operator inputting commands throughan input device such as keyboard 34 or joystick 36. The on/off commandsare routed through bi-phase cable 46 to camera 22 and RS-232 line totracking unit 50. Tracking unit 50 communicates its current status withLED indicators 70, 72 and relay 74. For example, LED 70 emits light whenunit 50 is on and flashes when unit 50 is tracking a target object. Whenunit 50 is tracking a target object, relay 74 communicates thisinformation to head end unit 34 via relay line 49. LED 72 emits lightwhen unit 50 is turned on but has experienced an error such as the lossof the video signal.

[0145] In the exemplary embodiment, if tracking unit 50 is on, eitherlooking for a target or tracking a target, and a higher priorityactivity is initiated, tracking unit 50 will turn off or become inactiveand after the higher priority activity has ceased and a dwell time haselapsed, i.e., the higher priority activity has timed out, tracking unit50 will turn back on and begin looking for a target. (PRIORITY TRACKINGUNIT ACTIVITY RANKING) ACTION Joy Stick Movement (1) Tracker changes toOFF status Camera Initiated Movement (2) Tracker changes to OFF statusTiming Out of Camera (3) Tracker changes to Initiated Movement Lookingfor Target status Timing Out of Joystick (3) Tracker changes to MovementLooking for Target status On Command from Head End (4) Tracker changesto Unit Looking for Target status Off Command from Head End (4) Trackerchanges to OFF Unit status

[0146] In alternative embodiments, the tracking unit may give up controlof camera 22 during human operator and/or camera initiated movement ofcamera and continue to analyze the images acquired by camera 22 todetect target objects. The continued detection of target objects whilethe camera is under the control of an operator or separate controller ispossible because the tracking unit 50 does not require the images usedto detect the target object to be acquired while the camera isstationary or for the images to each have the same field of view.

[0147] Once tracking unit 50 has detected a target object, it willcontinuously track the target object until it can no longer locate thetarget object, for example, the target object may leave the area whichis viewable by camera 22 or may be temporarily obscured by other objectsin the FOV. When unit 50 first loses the target object it will enterinto a reacquisition subroutine. If the target object is reacquired,tracking unit will continue tracking the target object, if the targethas not been found before the completion of the reacquisitionsubroutine, tracking unit 50 will change its status to Looking forTarget and control of the camera position will be returned to either thecamera controller or the human operator. The reacquisition subroutine isgraphically illustrated by the flow chart of FIG. 5. In the reacquiremode, tracking unit 50 first keeps the camera at the last position inwhich the target was tracked for approximately 10 seconds. If the targetis not reacquired, the camera is zoomed out in discrete incrementswherein the maximum zoom in capability of the camera corresponds to 100%and no zoom (i.e., no magnifying effect) corresponds to 0%. Morespecifically, the camera is zoomed out to the next lowest increment of20% and looks for the target for approximately 10 seconds in this newFOV. The camera continues to zoom out in 20% increments at 10 secondintervals until the target is reacquired or the camera reaches itsminimum zoom (0%) setting. After 10 seconds at the minimum zoom setting,if the target has not been reacquired, the status of tracking unit 50 ischanged to “Looking for Target”, the position of camera 22 returns to apredefine position or “tour” and the positional control of the camera isreturned to the operator or the controller embedded within camera 22.

[0148] As described above, system 20 uses a general purpose videoprocessing platform that obtains video and camera control informationfrom a standard PTZ camera. This configuration and use of a standard PTZcamera also allows for the retrofitting and upgrading of existinginstallations having installed PTZ cameras by the installing trackingunits 50 and coupling tracking units 50 with the existing PTZ cameras. Asystem which could be upgraded by the addition of one or more trackingunits 50 is discussed by Sergeant et al. in U.S. Pat. No. 5,517,236which is hereby incorporated herein by reference. By providing trackingunits 50 with a sheet metal housing their mounting on or near a PTZcamera to provide for PTZ control using image processing of the sourcevideo is facilitated. System 20 thereby provides a stand alone embeddedplatform which does not require a personal computer-based trackingsystem.

[0149] The present invention can be used in many environments where itis desirable to have video surveillance capabilities. For example,system 20 may be used to monitor manufacturing and warehouse facilitiesand track individuals who enter restricted areas. Head end unit 32 withdisplay 38 and input devices 34 and 36 may be positioned at a locationremote from the area being surveyed by camera 22 such as a guard room atanother location in the building. Although system 20 includes a methodfor automatically detecting a target object, the manual selection of atarget object by a human operator, such as by the operation of joystick36, could also be employed with the present invention. After manualselection of the target object, system 20 would track the target objectas described above for target objects identified automatically.

[0150] While this invention has been described as having an exemplarydesign, the present invention may be further modified within the spiritand scope of this disclosure. This application is therefore intended tocover any variations, uses, or adaptations of the invention using itsgeneral principles.

What is claimed is:
 1. A video tracking system comprising: a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; and at least one processor operably coupled to said camera wherein said processor receives video images acquired by said camera and selectively adjusts said camera; said processor programmed to detect a moving target object in said video images and adjust said camera to track said target object, said processor adjusting said camera at a plurality of varied adjustment rates.
 2. The video tracking system of claim 1 wherein said processor selects the adjustment rate of said camera as a function of at least one property of the target object.
 3. The video tracking system of claim 2 wherein the at least one property of the target object includes a velocity of the target object.
 4. The tracking system of claim 2 wherein said processor is programmed to select the adjustment rate of said camera based upon analysis of a first image and a second image wherein said first image is acquired by said camera adjusted to define a first field of view and said second image is acquired by said camera adjusted to define a second field of view.
 5. The tracking system of claim 4 wherein said first and second fields of view are partially overlapping and wherein determination of said selected adjustment rate by said processor includes identifying and aligning at least one common feature represented in each of said first and second images.
 6. The tracking system of claim 1 wherein said camera has a selectively adjustable focal length and said processor selects the focal length of said camera as a function of the distance of the target object from said camera.
 7. The tracking system of claim 1 wherein said camera is adjusted at a first selected adjustment rate until said processor selects a second adjustment rate and communicates said second adjustment rate to said camera.
 8. The tracking system of claim 4 wherein said camera defines a third field of view as said camera is being adjusted at said selected adjustment rate and wherein a third image is acquired by said camera when defining said third field of view, said first, second and third images being consecutively analyzed by said processor.
 9. The tracking system of claim 1 wherein said camera is selectively adjustable at a variable rate in adjusting at least one of a panning orientation of said camera and a tilt orientation of said camera.
 10. The tracking system of claim 1 wherein selective adjustment of said camera includes selective panning movement of said camera, said panning movement defining an x-axis, selective tilting movement of said camera, said tilting movement defining a y-axis, and selective focal length adjustment of said camera, adjustment of the focal length defining a z-axis, said x, y and z axes oriented mutually perpendicular.
 11. The tracking system of claim 10 wherein said processor adjusts said camera at a selected panning rate, said selected panning rate being a function of the velocity of said target object along said x-axis and said processor adjusts said camera at a selected tilting rate, said selected tilting rate being a function of the velocity of said target object along said y-axis.
 12. The tracking system of claim 1 further comprising a display device and an input device operably coupled to said system wherein an operator may view said video images on said display device and input commands or data into said system through said input device, said display device and input device being positionable remote from said camera.
 13. A video tracking system comprising: a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; and at least one processor operably coupled to said camera wherein said processor receives video images acquired by said camera and selectively adjusts said camera; said processor programmed to detect a moving target object in said video images and estimate a target value, said target value being a function of a property of said target object, said processor adjusting said camera at a selected adjustment rate, said selected adjustment rate being a function of said target value.
 14. The video tracking system of claim 13 wherein said camera is selectively adjustable at a variable rate in adjusting at least one of a panning orientation of said camera and a tilt orientation of said camera.
 15. The tracking system of claim 13 wherein selective adjustment of said camera includes selective panning movement of said camera, said panning movement defining an x-axis, selective tilting movement of said camera, said tilting movement defining a y-axis, and selective focal length adjustment of said camera, adjustment of the focal length defining a z-axis, said x, y and z axes oriented mutually perpendicular.
 16. The tracking system of claim 15 wherein said processor adjusts said camera at a selected panning rate, said selected panning rate being a function of the velocity of said target object along said x-axis and said processor adjusts said camera at a selected tilting rate, said selected tilting rate being a function of the velocity of said target object along said y-axis.
 17. The tracking system of claim 13 wherein said processor is programmed to estimate said target value based upon a first image and a second image wherein said first image is acquired by said camera adjusted to define a first field of view and said second image is acquired by said camera adjusted to define a second field of view.
 18. The tracking system of claim 17 wherein said first and second fields of view are partially overlapping and wherein determination of said selected adjustment rate by said processor includes identifying and aligning at least one common feature represented in each of said first and second images.
 19. The tracking system of claim 17 wherein said camera is adjusted at a first selected adjustment rate until said processor selects a second adjustment rate and communicates said second adjustement rate to said camera.
 20. The tracking system of claim 19 wherein said camera defines a third field of view as said camera is adjusted at said selected adjustment rate and wherein a third image is acquired by said camera when defining said third field of view, said first, second and third images being consecutively analyzed by said processor.
 21. The tracking system of claim 13 wherein said camera has a selectively adjustable focal length and said processor selects the focal length of said camera as a function of the distance of the target object from said camera.
 22. The tracking system of claim 13 further comprising a display device and an input device operably coupled to said system wherein an operator may view said video images on said display device and input commands or data into said system through said input device, said display device and input device being positionable remote from said camera.
 23. A video tracking system comprising: a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; and at least one processor operably coupled to said camera wherein said processor receives video images acquired by said camera and selectively adjusts said camera; said processor programmed to detect a moving target object in said video images and adjust said camera and track said target object and wherein during tracking of the target object said processor communicates a plurality of commands to said camera, said camera being continuously and variably adjustable in accordance with said commands without an intervening stationary interval.
 24. The video tracking system of claim 23 wherein said camera is selectively adjustable at a variable rate in adjusting at least one of a panning orientation of said camera and a tilt orientation of said camera.
 25. The tracking system of claim 23 wherein said commands includes a first command adjusting said camera at a selected rate and direction until a second command is received by said camera.
 26. The tracking system of claim 25 wherein said processor adjusts said camera at a selectively variable panning rate and at a selectively variable tilting rate.
 27. The tracking system of claim 23 wherein said camera acquires images for analysis by said processor while being adjusted.
 28. The tracking system of claim 23 wherein continuous and variable adjustment of said camera includes varying one of a direction of adjustment and a rate of adjustment.
 29. A video tracking system comprising: a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; and at least one processor operably coupled to said camera wherein said processor receives video images acquired by said camera and selectively adjusts said camera; said processor programmed to detect a moving target object in said video images and adjust said camera and track said target object wherein said processor consecutively analyzes first, second and third images acquired by said camera, each of said images recording a different field of view, said processor communicating to said camera a first command selectively adjusting said camera and a second command selectively adjusting said camera; said camera being adjusted in accordance with said first command during at least a portion of a first time interval between acquisition of said first and second images, said camera being adjusted in accordance with said second command during at least a portion of a second time interval between acquisition of said second and third images and wherein said camera is continuously adjusted between acquisition of said first image and said third image.
 30. The video tracking system of claim 29 wherein said camera is selectively adjustable at a variable rate in adjusting at least one of a panning orientation of said camera and a tilt orientation of said camera.
 31. The tracking system of claim 29 wherein said first command adjusts said camera at a selected rate and direction until said second command is received by said camera.
 32. A method of tracking a target object with a video camera, said method comprising: providing a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; and adjusting said camera at a selectively variable adjustment rate to track a target object.
 33. The method of claim 32 wherein said camera is adjusted at an adjustment rate which is selected as a function of at least one property of the target object.
 34. The method of claim 33 wherein the at least one property of the target object includes a velocity of the target object.
 35. The method of claim 33 wherein said adjustment rate is selected based upon analysis of a first image and a second image wherein said first image is acquired by said camera adjusted to define a first field of view and said second image is acquired by said camera adjusted to define a second field of view.
 36. The method of claim 35 wherein said first and second fields of view are partially overlapping and wherein determination of said adjustment rate includes identifying and aligning at least one common feature represented in each of said first and second images.
 37. The method of claim 35 wherein determination of said adjustment rate includes the use of a proportionality factor which is a function of the real world distance between the target object and said camera.
 38. The method of claim 32 wherein said camera is adjusted at a first selected adjustment rate until said processor selects a second adjustment rate and communicates said second adjustment rate to said camera.
 39. The method of claim 32 wherein adjusting said camera at a selectively variable adjustment rate comprises adjusting at least one of a panning orientation of said camera and a tilt orientation of said camera.
 40. The method of claim 32 wherein said camera is selectively adjustable at a variable rate in adjusting each of a panning orientation of said camera and a tilt orientation of said camera.
 41. The method of claim 32 wherein said camera is selectively adjustable at a variable rate in adjusting each of a panning orientation of said camera and a tilt orientation of said camera, and wherein each of said variable adjustment rates are selected as a function of the velocity of the target object.
 42. A method of tracking a target object with a video camera, said method comprising: providing a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; detecting a target object in images acquired by said camera; estimating a target value which is a function of at least one property of the target object; and adjusting said camera at a selectively variable rate wherein said adjustment rate of said camera rate is selected as a function of said target value.
 43. The method of claim 42 wherein the at least one property of the target object includes a velocity of the target object.
 44. The method of claim 42 wherein adjusting said camera at a selectively variable adjustment rate includes selecting said adjustment rate based upon analysis of a first image and a second image wherein said first image is acquired by said camera adjusted to define a first field of view and said second image is acquired by said camera adjusted to define a second field of view.
 45. The method of claim 44 wherein said first and second fields of view are partially overlapping and wherein determination of said adjustment rate includes identifying and aligning at least one common feature represented in each of said first and second images.
 46. The method of claim 42 wherein the camera has a selectively adjustable focal length and the method further comprises adjusting the focal of said camera as a function of the distance of the target object from the camera.
 47. The method of claim 42 wherein adjusting the camera further comprises adjusting the camera at a first selected adjustment rate until a second selected adjustment rate is communicated to the camera.
 48. The method of claim 42 wherein adjusting said camera at a selectively variable adjustment rate comprises adjusting at least one of a panning orientation of said camera and a tilt orientation of said camera.
 49. The method of claim 42 wherein adjusting said camera at a selectively variable adjustment rate includes selectively adjusting at a variable rate each of a panning orientation of said camera and a tilt orientation of said camera.
 50. The method of claim 42 wherein the step of adjusting said camera includes selecting a first adjustment rate and direction for adjusting the camera and continuing to adjust the camera at the first adjustment rate and direction until a second adjustment rate and direction are selected.
 51. A method of tracking a target object with a video camera, said method comprising: providing a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; and adjusting said camera to track a target object wherein said adjustment of said camera includes selectively and variably adjusting at least one adjustment parameter and wherein said camera is continuously adjustable during said selective and variable adjustment of said at least one adjustment parameter.
 52. The method of claim 51 wherein selectively and variably adjusting said at least one adjustment parameter of said camera includes the adjustment of at least one of a panning orientation of said camera and a tilt orientation of said camera.
 53. The method of claim 51 wherein selectively and variably adjusting said at least one adjustment parameter of said camera includes adjusting said camera at a selectively variable rate in the adjustment of at least one of a panning orientation of said camera and a tilt orientation of said camera.
 54. The method of claim 51 wherein selectively and variably adjusting said at least one adjustment parameter of said camera includes adjusting said camera at a selectively variable rate in the adjustment of each of a panning orientation of said camera and a tilt orientation of said camera.
 55. The method of claim 54 wherein said selective and variable adjustment of said at least one adjustment parameter includes varying one of a direction of adjustment and a rate of adjustment.
 56. The method of claim 54 wherein at least one of the adjustment parameters are adjusted at a variable adjustment rate selected as a function of the velocity of the target object.
 57. A method of tracking a target object with a video camera, said method comprising: providing a video camera having a field of view, said camera being selectively adjustable wherein adjustment of said camera varies the field of view of said camera; detecting a target object in images acquired by said camera; acquiring first, second and third images, each of said first, second and third images recording a different field of view; communicating a first command to said camera selectively adjusting said camera; communicating a second command to said camera selectively adjusting said camera; and continuously adjusting said camera between acquisition of said first image and acquisition of said third image wherein said camera is adjusted in accordance with said first command during at least a portion of a first time interval between acquisition of said first image and acquisition of said second image and said camera is adjusted in accordance with said second command during at least a portion of a second time interval between acquisition of said second image and acquisition of said third image.
 58. The method of claim 57 wherein said first and second commands selectively adjust at least one of a panning orientation of said camera, a tilt orientation of said camera, and a focal length of said camera.
 59. The method of claim 57 wherein said first and second commands selectively adjust said camera at a selectively variable adjustment rate in the adjustment of at least one of a panning orientation of said camera and a tilt orientation of said camera.
 60. The method of claim 57 wherein said first and second commands select a variable adjustment rate for each of a panning orientation of said camera and a tilt orientation of said camera.
 61. The method of claim 60 wherein at least one of the variable adjustment rates are selected as a function of the velocity of the target object.
 62. A video tracking system comprising: a video camera having a selectively adjustable focal length; and at least one processor operably coupled to said camera wherein said processor receives video images acquired by said camera and selectively adjusts the focal length of said camera; said processor programmed to detect a moving target object in said video images and adjust the focal length of said camera as a function of the distance of the target object from the camera.
 63. The video tracking system of claim 62 wherein said camera has a selectively adjustable panning orientation and a selectively adjustable tilting orientation; said processor adjusting said panning orientation and said tilting orientation to maintain the target object centered in the video images and wherein said processor selectively adjusts the focal length of said camera as a function of the tilt angle.
 64. A method of automatically tracking a target object with a video camera, said method comprising: providing a video camera having a selectively adjustable focal length; and adjusting the focal length of the camera as a function of the distance of the target object from the camera.
 65. The method of claim 64 wherein the camera has a selectively adjustable panning orientation and a selectively adjustable tilting orientation and said method further includes adjusting the panning and tilting orientation of the camera to track the target object and selectively adjusting the focal length of the camera as a function of the tilt angle of camera. 